euclidean k-means
Clustering Stable Instances of Euclidean k-means.
The Euclidean k-means problem is arguably the most widely-studied clustering problem in machine learning. While the k-means objective is NP-hard in the worst-case, practitioners have enjoyed remarkable success in applying heuristics like Lloyd's algorithm for this problem. To address this disconnect, we study the following question: what properties of real-world instances will enable us to design efficient algorithms and prove guarantees for finding the optimal clustering? We consider a natural notion called additive perturbation stability that we believe captures many practical instances of Euclidean k-means clustering. Stable instances have unique optimal k-means solutions that does not change even when each point is perturbed a little (in Euclidean distance). This captures the property that k-means optimal solution should be tolerant to measurement errors and uncertainty in the points. We design efficient algorithms that provably recover the optimal clustering for instances that are additive perturbation stable. When the instance has some additional separation, we can design a simple, efficient algorithm with provable guarantees that is also robust to outliers. We also complement these results by studying the amount of stability in real datasets, and demonstrating that our algorithm performs well on these benchmark datasets.
Clustering Stable Instances of Euclidean k-means.
The Euclidean k-means problem is arguably the most widely-studied clustering problem in machine learning. While the k-means objective is NP-hard in the worst-case, practitioners have enjoyed remarkable success in applying heuristics like Lloyd's algorithm for this problem. To address this disconnect, we study the following question: what properties of real-world instances will enable us to design efficient algorithms and prove guarantees for finding the optimal clustering? We consider a natural notion called additive perturbation stability that we believe captures many practical instances of Euclidean k-means clustering. Stable instances have unique optimal k-means solutions that does not change even when each point is perturbed a little (in Euclidean distance). This captures the property that k-means optimal solution should be tolerant to measurement errors and uncertainty in the points. We design efficient algorithms that provably recover the optimal clustering for instances that are additive perturbation stable. When the instance has some additional separation, we can design a simple, efficient algorithm with provable guarantees that is also robust to outliers. We also complement these results by studying the amount of stability in real datasets, and demonstrating that our algorithm performs well on these benchmark datasets.
Reviews: Clustering Stable Instances of Euclidean k-means.
The authors propose a notion of additive perturbation stability (APS) for Euclidean distances that maintain the optimal k-means clustering solution when each point in the data is moved by a sufficiently small Euclidean distance. I think the paper is rather interesting; however, the results of the paper are not very surprising. Here are my comments regarding the paper: (1) To my understanding, the results of Theorem 1.2 are only under the condition of APS. They only hold for the case of k 2 components and may lead to exponential dependence on k components for large k . However, under the additional margin condition between any two pairs of cluster, we will able to guarantee the existence of polynomial algorithm on k .
Clustering Stable Instances of Euclidean k-means.
Aravindan Vijayaraghavan, Abhratanu Dutta, Alex Wang
The Euclidean k-means problem is arguably the most widely-studied clustering problem in machine learning. While the k-means objective is NP-hard in the worst-case, practitioners have enjoyed remarkable success in applying heuristics like Lloyd's algorithm for this problem. To address this disconnect, we study the following question: what properties of real-world instances will enable us to design efficient algorithms and prove guarantees for finding the optimal clustering? We consider a natural notion called additive perturbation stability that we believe captures many practical instances of Euclidean k-means clustering. Stable instances have unique optimal k-means solutions that does not change even when each point is perturbed a little (in Euclidean distance). This captures the property that k-means optimal solution should be tolerant to measurement errors and uncertainty in the points. We design efficient algorithms that provably recover the optimal clustering for instances that are additive perturbation stable. When the instance has some additional separation, we can design a simple, efficient algorithm with provable guarantees that is also robust to outliers. We also complement these results by studying the amount of stability in real datasets, and demonstrating that our algorithm performs well on these benchmark datasets.
$k$-means on Positive Definite Matrices, and an Application to Clustering in Radar Image Sequences
Fryer, Daniel, Nguyen, Hien, Castellazzi, Pascal
However, performing k-means on SPD matrices may correspond bijectively to mean centered Gaussian distributions, be difficult, without a computationally efficient form for the and are used to model Brownian motion in Diffusion Fréchet mean [13]. Tensor Imaging (DTI), where they are referred to as tensors [1]. The finite-lag autocovariance matrices of time-series are In Section II, we introduce the log-Cholesky distance and SPD, and have been used in compression based clustering closed-form expression for the corresponding Fréchet mean.
Clustering Stable Instances of Euclidean k-means.
Vijayaraghavan, Aravindan, Dutta, Abhratanu, Wang, Alex
The Euclidean k-means problem is arguably the most widely-studied clustering problem in machine learning. While the k-means objective is NP-hard in the worst-case, practitioners have enjoyed remarkable success in applying heuristics like Lloyd's algorithm for this problem. To address this disconnect, we study the following question: what properties of real-world instances will enable us to design efficient algorithms and prove guarantees for finding the optimal clustering? We consider a natural notion called additive perturbation stability that we believe captures many practical instances of Euclidean k-means clustering. Stable instances have unique optimal k-means solutions that does not change even when each point is perturbed a little (in Euclidean distance).
Clustering Stable Instances of Euclidean k-means.
Vijayaraghavan, Aravindan, Dutta, Abhratanu, Wang, Alex
The Euclidean k-means problem is arguably the most widely-studied clustering problem in machine learning. While the k-means objective is NP-hard in the worst-case, practitioners have enjoyed remarkable success in applying heuristics like Lloyd's algorithm for this problem. To address this disconnect, we study the following question: what properties of real-world instances will enable us to design efficient algorithms and prove guarantees for finding the optimal clustering? We consider a natural notion called additive perturbation stability that we believe captures many practical instances of Euclidean k-means clustering. Stable instances have unique optimal k-means solutions that does not change even when each point is perturbed a little (in Euclidean distance). This captures the property that k-means optimal solution should be tolerant to measurement errors and uncertainty in the points. We design efficient algorithms that provably recover the optimal clustering for instances that are additive perturbation stable. When the instance has some additional separation, we can design a simple, efficient algorithm with provable guarantees that is also robust to outliers. We also complement these results by studying the amount of stability in real datasets, and demonstrating that our algorithm performs well on these benchmark datasets.